A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks
نویسندگان
چکیده
Phonetic segmentation is the process of splitting speech into distinct phonetic units. Human experts routinely perform this task manually by analyzing auditory and visual cues using analysis software, which is an extremely time-consuming process. Methods exist for automatic segmentation, but these are not always accurate enough. In order to improve automatic segmentation, we need to model it as close to the manual segmentation as possible. This corpus is an effort to capture the human segmentation behavior by recording experts performing a segmentation task. We believe that this data will enable us to highlight the important aspects of manual segmentation, which can be used in automatic segmentation to improve its accuracy.
منابع مشابه
Acoustic cues identifying phonetic transitions for speech segmentation
The quality of corpus-based text-to-speech (TTS) systems depends strongly on the consistency of boundary placements during phonetic alignments. Expert human transcribers use visually represented acoustic cues in order to consistently place boundaries at phonetic transitions according to a set of conventions. We present some features commonly (and informally) used as aid when performing manual s...
متن کاملThe OFAI Multi-Modal Task Description Corpus
The OFAI Multimodal Task Description Corpus (OFAI-MMTD Corpus) is a collection of dyadic teacher-learner (human-human and human-robot) interactions. The corpus is multimodal and tracks the communication signals exchanged between interlocutors in task-oriented scenarios including speech, gaze and gestures. The focus of interest lies on the communicative signals conveyed by the teacher and which ...
متن کاملAchieving Multimodal Cohesion during Intercultural Conversations
How do English as a lingua franca (ELF) speakers achieve multimodal cohesion on the basis of their specific interests and cultural backgrounds? From a dialogic and collaborative view of communication, this study focuses on how verbal and nonverbal modes cohere together during intercultural conversations. The data include approximately 160-minute transcribed video recordings of ELF interactions ...
متن کاملThe Vernissage Corpus: a Multimodal Human-robot-interaction Dataset
We introduce a new multimodal interaction dataset with extensive annotations in a conversational Human-RobotInteraction (HRI) scenario. It has been recorded and annotated to benchmark many relevant perceptual tasks, towards enabling a robot to converse with multiple humans, such as speaker localization, key word spotting, speech recognition in audio domain; tracking, pose estimation, nodding, v...
متن کاملAtypical Gaze Behavior in Children with High Functioning Autism During an Active Balance Task
Background. Unusual gaze behavior in children with autism spectrum disorder (ASD) was reported very early in the literature. Objectives. The current study examined gaze behavior in children with ASD and typically developing (TD) children while performing an active balance task on the Wii balance board. Methods: 8 children (male) diagnosed with high-functioning ASD and 9 TD children (3 female, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1712.04798 شماره
صفحات -
تاریخ انتشار 2017